Skip to content

Phase 1: connection-mode enum + backwards-compat (issue #5989)#6047

Open
MichaelUray wants to merge 19 commits intonetbirdio:mainfrom
MichaelUray:feat/connection-mode-phase1
Open

Phase 1: connection-mode enum + backwards-compat (issue #5989)#6047
MichaelUray wants to merge 19 commits intonetbirdio:mainfrom
MichaelUray:feat/connection-mode-phase1

Conversation

@MichaelUray
Copy link
Copy Markdown
Contributor

@MichaelUray MichaelUray commented May 1, 2026

Summary

This PR is Phase 1 of three that together implement issues #5989 and #5990 (the connection-mode consolidation RFC and its server-side per-peer/per-group companion).

In Scope:

  • New ConnectionMode proto enum with four reserved values (relay-forced, p2p, p2p-lazy, p2p-dynamic); three are functional in this PR. p2p-dynamic is reserved on the wire and in the DB but the daemon currently treats it like p2p (pass-through). The decoupled worker_relay/worker_ice lifecycle that makes p2p-dynamic distinct comes in Phase 2.
  • Two new account-level settings on the management server: relay_timeout_seconds, p2p_timeout_seconds (both nullable, NULL = built-in default).
  • Replaces the asymmetric Lazy/ForceRelay precedence in client/internal/conn_mgr.go with a single resolver: client env > client config > server-pushed.
  • Full backwards-compatibility:
    • Old clients still see only the legacy lazy_connection_enabled boolean (mapped from the resolved mode by toPeerConfig).
    • Old servers send connection_mode = UNSPECIFIED (proto default 0) and the new client falls back to the legacy boolean -- verified via tests.
    • NB_FORCE_RELAY, NB_ENABLE_EXPERIMENTAL_LAZY_CONN, --enable-lazy-connection, and NB_LAZY_CONN_INACTIVITY_THRESHOLD continue to work and emit one-shot deprecation warnings.

The companion Dashboard PR is at netbirdio/dashboard#627 (will be added once it exists).

Implementation map

Layer Files
proto shared/management/proto/management.proto, client/proto/daemon.proto
Mode type shared/connectionmode/ (Mode, ParseString, FromProto, ToProto, ResolveLegacyLazyBool, ToLazyConnectionEnabled)
client env / CLI client/internal/peer/env.go (ResolveModeFromEnv), client/cmd/{root,up}.go (--connection-mode, --relay-timeout, --p2p-timeout)
client engine / conn-mgr client/internal/{engine,conn_mgr,connect}.go (resolveConnectionMode + UpdatedRemotePeerConfig + EngineConfig fields)
client per-peer client/internal/peer/conn.go (Mode-driven skip-ICE branch)
mgmt server management/server/types/settings.go (new nullable columns), management/server/http/handlers/accounts/accounts_handler.go (PUT validation + response), management/server/account.go (audit emission), management/server/activity/codes.go (3 new event codes), management/internals/shared/grpc/conversion.go (toPeerConfig writes both old and new wire fields)
OpenAPI shared/management/http/api/openapi.yml + regenerated types.gen.go
tests shared/connectionmode/mode_test.go, client/internal/peer/env_test.go, client/internal/conn_mgr_test.go, management/internals/shared/grpc/conversion_test.go

Hardware-tested

Verified on a real production NetBird instance with 32 connected peers across 12 OpenWrt-router versions (22.03 to 25.12), Windows 10/11, Debian, Android 12/14, and iOS 26.3.1:

  • Pre-deploy baseline: 32 connected.
  • After deploying the Phase-1 management image: 32 connected, no disconnect, all old clients continued to function unchanged (they read only the legacy lazy_connection_enabled boolean which toPeerConfig keeps writing via the back-compat mapping).
  • Switched the account-wide connection_mode between p2p and p2p-lazy via the API: 32 connected throughout, no disconnect.
  • 8-minute monitoring window after deploy: zero peer-count drift.

The daemon-side change is exercised end-to-end by switching the account-wide setting and observing the daemon pick up the new mode on the next NetworkMap update (handled by UpdatedRemotePeerConfig).

Known limitations (called out in spec section 8)

  1. relay-forced cannot be pushed to old clients because the legacy lazy_connection_enabled boolean cannot express it. The wire field falls back to false and old clients run in normal p2p. Workaround: upgrade the client, or set local NB_FORCE_RELAY=true.
  2. p2p-dynamic is not visible in the Dashboard dropdown in Phase 1 even though the API accepts it; setting it via API gives p2p behaviour until the Phase-2 daemon implementation lands.
  3. Audit events are account-scoped only; per-peer / per-group events follow in Phase 3 (Per-peer and per-group connection-mode and inactivity-threshold override on the management server #5990).

Phase 2 / 3 follow-ups

Test plan

  • All new + existing unit tests in client/internal/... and management/... green
  • TestAccount_GetPeerNetworkMap (existing) remains green -- old peers without connection_mode resolve via ResolveLegacyLazyBool exactly as before
  • TestToPeerConfig_ConnectionModeResolution (new, 9 sub-cases) covers the resolution matrix
  • TestResolveConnectionMode (new, 10 sub-cases) covers the env > config > server precedence chain
  • TestResolveModeFromEnv (new, 9 sub-cases) covers the env-var matrix including the legacy-conflict case
  • Hardware-verified against 32 mixed-version peers in production (no disconnects)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Configurable peer connection modes (relay-forced, p2p, p2p-lazy, p2p-dynamic) via CLI flags and account settings.
    • Relay and P2P timeout overrides to tune idle behavior.
    • Activity/audit events emitted when connection-mode or timeout settings change.
  • Tests

    • Added unit tests covering mode/timeout resolution, env-var parsing, serialization round-trips, and API validation.

MichaelUray and others added 14 commits May 1, 2026 07:59
…nfig

Additive change for issue netbirdio#5989 Phase 1. New fields use new tag numbers
(11, 12, 13); existing fields (including LazyConnectionEnabled tag 6) are
unchanged so old clients ignore the additions and old servers send
UNSPECIFIED, which the new client maps back via the legacy boolean.

Note: the regenerated pb.go files now report protoc v5.29.3 in their
header (this branch was generated with locally-installed protoc 29.3
instead of upstream's v7.34.1). Functionally identical; header diff is
the only delta beyond the actual schema additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines Mode enum (relay-forced, p2p, p2p-lazy, p2p-dynamic plus the
client-only sentinels Unspecified and FollowServer), ParseString for
CLI/env input, ToProto/FromProto for wire translation, and the two
backwards-compat helpers ResolveLegacyLazyBool / ToLazyConnectionEnabled
that bridge the old Settings.LazyConnectionEnabled boolean.

Phase 1 of issue netbirdio#5989. Pure addition -- no existing callers touched
in this commit; the engine/conn_mgr migration follows in subsequent
commits in the same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on warns

NB_CONNECTION_MODE wins over the legacy pair (NB_FORCE_RELAY,
NB_ENABLE_EXPERIMENTAL_LAZY_CONN); when the legacy pair is set
together, NB_FORCE_RELAY wins (most-restrictive, mirrors the
group-conflict rule from issue netbirdio#5990).

Each legacy var emits a one-shot deprecation warning when it actually
contributes to the resolved mode. NB_LAZY_CONN_INACTIVITY_THRESHOLD
becomes an alias for the future relay_timeout setting and warns once.

IsForceRelayed() is kept for callers that have not yet been migrated
(conn.go, statusrecorder); they will be updated in the engine/conn
refactor commits later in this PR.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new CLI flags map onto the new connection-mode plumbing:
- --connection-mode <relay-forced|p2p|p2p-lazy|p2p-dynamic|follow-server>
- --relay-timeout <seconds>
- --p2p-timeout <seconds>

Plumbed through three sites in cmd/up.go (SetConfigRequest, ConfigInput,
LoginRequest), persisted in profilemanager.Config, and added as new
fields on the daemon.proto IPC messages. Empty / not-changed flags fall
back to the server-pushed value (which itself falls back to the legacy
lazy_connection_enabled boolean for old servers).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EngineConfig gains ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds.
ConnMgr now stores the resolved Mode plus the raw inputs (env, config)
so it can re-resolve when the server pushes a new PeerConfig.

UpdatedRemoteFeatureFlag is renamed to UpdatedRemotePeerConfig and
takes the full PeerConfig pointer; a thin shim with the old name
delegates to it for callers that haven't been updated yet.

connect.go copies the three new fields from profilemanager.Config into
the EngineConfig builder, with a tolerant parser that logs and falls
through to Unspecified on invalid input.

Phase 1 of issue netbirdio#5989. peer/conn.go forwarding follows in C4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConnConfig gains a Mode field forwarded from the engine. Open() now
checks Mode == ModeRelayForced instead of calling the global env-reader
IsForceRelayed(). The local 'forceRelay' variable name is renamed to
'skipICE' to make the new branching intent explicit.

The PeerStateUpdate block at the end of Open() also reads from
conn.config.Mode now, so the StatusRecorder sees the per-peer mode
rather than the global env var.

A single remaining caller of IsForceRelayed() (srWatcher.Start in
engine.go) is left for a follow-up; that path uses a process-wide flag
not per-peer state, so it can be migrated in Phase 2 once srWatcher
itself learns about ConnectionMode.

Phase 1 of issue netbirdio#5989. Engine forwarding (C5) follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
createPeerConn now reads ConnMgr.Mode() and copies it into
peer.ConnConfig, so the per-peer Open() loop in conn.go can take the
ModeRelayForced skip-ICE branch without reading the global env var.

This is the last wiring commit for the client side of Phase 1; the
server-side mgmt changes (Settings + OpenAPI + handler + audit +
NetworkMap-build) follow in Section D.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three fields are nullable to distinguish 'use built-in default'
(NULL) from explicit values (incl. 0 = never tear down). Copy() now
deep-clones the new pointer fields via two small helpers.

GORM AutoMigrate creates the new columns at first start; existing
accounts have NULL in all three columns and resolve via the legacy
LazyConnectionEnabled boolean.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tings

Three new optional, nullable fields with descriptions of the NULL =
built-in-default semantics and the Phase-1-vs-Phase-2 status of
p2p-dynamic. Regenerated types.gen.go via the existing oapi-codegen
tooling.

The generated AccountSettingsConnectionMode enum has the canonical
values relay-forced / p2p / p2p-lazy / p2p-dynamic, plus a Valid()
helper for handler-side validation.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PUT /api/accounts/{id} now accepts connection_mode (validated against
the four-value enum via the generated AccountSettingsConnectionMode.
Valid()), p2p_timeout_seconds and relay_timeout_seconds. NULL in the
JSON body keeps the existing value untouched (= "no client-side
override on this round-trip"); explicit NULL-clear via API uses a
distinct PATCH-style call which is out-of-scope for Phase 1.

Response payload mirrors the input fields back as nullable so the
dashboard can distinguish "use default" from "explicit value".

Phase 1 of issue netbirdio#5989. Audit-event emission follows in D5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AccountConnectionModeChanged (121), AccountRelayTimeoutChanged (122),
AccountP2pTimeoutChanged (123) -- emitted from account.go when settings
change. Per-peer / per-group event codes are reserved for Phase 3
(issue netbirdio#5990).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleConnectionModeSettings is invoked from the same diff-detection
block as handleLazyConnectionSettings; emits one StoreEvent per
changed field (ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds)
with old/new values in the meta payload.

Four small ptr-equality / deref helpers are added for nullable string
and uint32 fields. They are package-private and named after the
existing convention used elsewhere in the package.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elds

Two changes in one commit because they're inseparable:

1. Move client/internal/peer/connectionmode/ to shared/connectionmode/.
   The package now needs to be importable from BOTH client/ and
   management/ (which is impossible while it lives under client/internal/
   per Go's internal-package rule). All imports updated; tests pass on
   both sides.

2. Extend management/internals/shared/grpc/conversion.go::toPeerConfig
   to populate the three new PeerConfig fields (ConnectionMode,
   P2PTimeoutSeconds, RelayTimeoutSeconds) using the connectionmode
   helpers. The legacy LazyConnectionEnabled boolean is now derived
   from the resolved Mode via ToLazyConnectionEnabled() rather than
   copied verbatim from Settings -- this is the central backwards-compat
   contract: old clients see only the boolean, new clients prefer the
   explicit enum and ignore the bool.

Resolution rules (Phase 1, account-wide only):
- Settings.ConnectionMode != nil and parses -> wins
- Otherwise -> ResolveLegacyLazyBool(LazyConnectionEnabled)
- timeouts: Settings.RelayTimeoutSeconds / P2pTimeoutSeconds when non-NULL,
  else 0 (= server has no preference; client uses built-in default)

Per-peer / per-group resolution comes in Phase 3 (netbirdio#5990).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine sub-cases cover the Phase-1 resolution matrix from spec section 3:
- no settings -> default (P2P + lazy=false)
- legacy bool only -> mapped via ResolveLegacyLazyBool
- explicit ConnectionMode -> wins over the legacy bool
- timeouts propagate
- garbage ConnectionMode value -> tolerant fallback to legacy bool

Particular attention to the structural compat gap: relay-forced cannot
be expressed via the legacy boolean, so the wire field for old clients
is sent as false. Documented in the spec, asserted here.

Existing TestAccount_GetPeerNetworkMap remains green: existing test
peers have ConnectionMode=NULL in Settings, falls through to the legacy
ResolveLegacyLazyBool(false) -> ModeP2P -> wire bool false.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

Adds a connection-mode enum and per-peer timeout overrides end-to-end: CLI flags, env/profile resolution, client conn manager/engine/peer changes, management API/proto/storage updates, and server audit/activity handling for connection-mode and timeout settings.

Changes

Connection Mode & Timeout End-to-End

Layer / File(s) Summary
Data Shape / Wire
shared/management/proto/management.proto, client/proto/daemon.proto, shared/management/http/api/openapi.yml, shared/management/http/api/types.gen.go
Adds ConnectionMode enum and P2pTimeoutSeconds/RelayTimeoutSeconds fields to management proto, client-daemon proto, OpenAPI schema, and generated API types.
Domain Enum & Helpers
shared/connectionmode/mode.go, shared/connectionmode/mode_test.go
Introduces connectionmode.Mode enum, parsing/stringification, proto translation, legacy boolean compatibility, and unit tests.
Server Model & Persistence
management/server/types/settings.go, management/internals/shared/grpc/conversion.go
Adds nullable ConnectionMode/timeout fields to Settings, deep-copy helpers, and maps settings into PeerConfig proto with legacy boolean fallback; includes conversion test.
HTTP API / Validation
management/server/http/handlers/accounts/accounts_handler.go, management/server/http/handlers/accounts/accounts_handler_test.go, shared/management/http/api/types.gen.go
Validates and converts nullable timeout inputs, validates connection_mode, exposes settings in account responses, and adds tests for timeout validation.
Account Update & Activity
management/server/account.go, management/server/activity/codes.go, management/server/account_test.go
Detects peer-impacting changes for connection-mode/timeouts, emits per-field activity events, registers new activity codes, and adds regression test ensuring serial increments on changes.
CLI / Requests
client/cmd/root.go, client/cmd/up.go
Adds experimental persistent flags (connection-mode, relay-timeout, p2p-timeout) and propagates them into login/set-config requests only when flags changed.
Profile Storage
client/internal/profilemanager/config.go, client/internal/profilemanager/config_test.go
Adds optional ConnectionMode and nullable timeout fields to ConfigInput/Config, preserves nil-vs-value semantics in storage, and adds JSON round-trip test for pointer nil-ness.
Env Resolution & Deprecation
client/internal/peer/env.go, client/internal/peer/env_test.go, client/internal/lazyconn/env.go
Adds NB_CONNECTION_MODE env key and ResolveModeFromEnv() with precedence over legacy vars, inactivity threshold parsing, and one-time deprecation warnings; tests added.
Client ConnMgr / Engine / Peer
client/internal/conn_mgr.go, client/internal/conn_mgr_test.go, client/internal/engine.go, client/internal/connect.go, client/internal/peer/conn.go
Refactors conn manager to resolve a Mode+relay timeout (env → client config → server), exposes Mode()/RelayTimeout(), handles UpdatedRemotePeerConfig(), engine now carries/passess resolved mode/timeouts into EngineConfig and peer ConnConfig.Mode, and peer logic conditions ICE/relay behavior on mode.
Management gRPC Conversion Tests
management/internals/shared/grpc/conversion_test.go
Adds tests verifying toPeerConfig resolution precedence and propagation of mode/timeouts to proto.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as Client CLI
    participant Env as Env/Profile
    participant ConnMgr as Connection Manager
    participant Engine as Engine
    participant Peer as Peer Conn
    participant Server as Management Server

    User->>CLI: provide flags/env/profile
    CLI->>Engine: start / login (includes flags when changed)
    Env->>ConnMgr: ResolveModeFromEnv()
    CLI->>ConnMgr: profile/config inputs
    ConnMgr->>ConnMgr: resolveConnectionMode(env → config → server)
    ConnMgr->>ConnMgr: set Mode(), RelayTimeout()

    Server->>Engine: updateNetworkMap() (sends PeerConfig w/ ConnectionMode + timeouts)
    Engine->>ConnMgr: UpdatedRemotePeerConfig(pc)
    ConnMgr->>ConnMgr: re-resolve mode (env → config → server)
    ConnMgr->>ConnMgr: start/stop lazy manager on state change

    Engine->>Peer: Create connection with ConnConfig.Mode and timeouts
    Peer->>Peer: conditionally skip ICE/relay logic per Mode
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • pascal-fischer
  • crn4

Poem

🐇 Hopping through flags and modes I roam,

relay, p2p, lazy — each finds its home.
Timeouts set, configs aligned with care,
Env, server, profile — they all share.
A little rabbit cheers: connections fair!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: introducing a connection-mode enum and maintaining backwards compatibility (Phase 1 of a three-phase implementation).
Description check ✅ Passed The description provides comprehensive context including a detailed summary, implementation map, hardware testing results, known limitations, and follow-up phases, aligning with the repository template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/up.go`:
- Around line 437-439: The connection-mode flag value must be normalized before
persisting or sending: when the flag value (connectionMode variable read via
connectionModeFlag) equals "relay" (case-insensitive, trimmed) convert it to the
canonical "relay-forced" string and then assign that normalized value to
req.ConnectionMode and any profile write paths; leave other values unchanged and
ensure the same normalization is applied in the other occurrences that set
req.ConnectionMode or write the profile (the other blocks referenced around the
same flag usage).

In `@client/internal/peer/env_test.go`:
- Line 25: Update the test case name string in the table entry that currently
reads "connection_mode unparseable falls through to legacy" to use the correct
spelling "connection_mode unparsable falls through to legacy"; locate the tuple
in peer/env_test.go (the table entry with values "garbage", "true",
connectionmode.ModeRelayForced, etc.) and change the first element only to keep
codespell green.

In `@client/internal/peer/env.go`:
- Around line 75-77: Validate the parsed duration from time.ParseDuration(raw)
before assigning to timeoutSecs: ensure the duration is non-negative and that
d.Seconds() does not exceed the maximum uint32 value (so it won't wrap when
cast). Specifically, in the block handling envInactivityThreshold, check d >= 0
and d.Seconds() <= float64(math.MaxUint32) (or compare against
time.Duration(maxUint32Seconds)*time.Second) and only then set timeoutSecs =
uint32(d.Seconds()); otherwise log or handle the invalid value (e.g., ignore the
env value or clamp to max) while still calling
warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the
management server").

In `@client/internal/profilemanager/config.go`:
- Around line 177-179: The timeout fields RelayTimeoutSeconds and
P2pTimeoutSeconds should be made nullable so the persisted profile can
distinguish "unset" from an explicit zero: change their types from uint32 to
*uint32 in the profile struct(s) (e.g., the struct containing ConnectionMode,
RelayTimeoutSeconds, P2pTimeoutSeconds) and keep json:",omitempty" so nil is
omitted while an explicit zero is preserved when set; update any code that
reads/writes these fields (decoders, defaults, and consumers) to handle nil vs
non-nil dereferencing and apply the same change for the corresponding
p2p_timeout_seconds occurrences referenced around lines 604-618.

In `@management/server/account.go`:
- Around line 372-375: UpdateAccountSettings currently doesn't mark
updateAccountPeers or bump the network serial when ConnectionMode,
RelayTimeoutSeconds, or P2pTimeoutSeconds change, so peers won't receive the new
values; inside UpdateAccountSettings (near the existing updateAccountPeers
gating above the calls to am.handleRoutingPeerDNSResolutionSettings /
handleLazyConnectionSettings / handleConnectionModeSettings /
handlePeerLoginExpirationSettings) detect differences on ConnectionMode,
RelayTimeoutSeconds, and P2pTimeoutSeconds between oldSettings and newSettings,
set updateAccountPeers = true (or equivalent), and ensure the account/network
serial is bumped so connected peers are notified of the change.

In `@management/server/http/handlers/accounts/accounts_handler.go`:
- Around line 229-236: The code casts signed API timeout fields
(req.Settings.P2pTimeoutSeconds, req.Settings.RelayTimeoutSeconds) straight to
uint32 which will silently wrap on negative or out-of-range values; add explicit
validation before casting: check each non-nil value is >= 0 and <=
math.MaxUint32 (or 4294967295), and if not return a 400 validation error (use
the same error/response helper used elsewhere in accounts_handler.go). Implement
the checks inline where returnSettings.P2pTimeoutSeconds and
returnSettings.RelayTimeoutSeconds are set (or extract to a small helper
validateTimeout(name string, v *int64) error) and only perform the uint32
conversion after the value passes validation.

In `@shared/management/http/api/types.gen.go`:
- Around line 1517-1521: The handler currently casts pointer timeout fields like
P2pTimeoutSeconds (and the other *int64 timeout fields) directly to uint32,
which allows negative values to wrap; update the account handler code that
performs the cast to first check for nil, then validate the pointed int64 is >=
0 and <= math.MaxUint32, returning a validation error (or reject the request) if
out of range; only then convert safely to uint32. Identify the conversion sites
in the accounts handler where P2pTimeoutSeconds (and the similar timeout fields)
are dereferenced/cast and apply this guard before performing the uint32 cast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a58c2ed6-4fbf-4f05-a219-4fc44e13ce33

📥 Commits

Reviewing files that changed from the base of the PR and between c4b2da4 and c272786.

⛔ Files ignored due to path filters (4)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (23)
  • client/cmd/root.go
  • client/cmd/up.go
  • client/internal/conn_mgr.go
  • client/internal/conn_mgr_test.go
  • client/internal/connect.go
  • client/internal/engine.go
  • client/internal/lazyconn/env.go
  • client/internal/peer/conn.go
  • client/internal/peer/env.go
  • client/internal/peer/env_test.go
  • client/internal/profilemanager/config.go
  • client/proto/daemon.proto
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/conversion_test.go
  • management/server/account.go
  • management/server/activity/codes.go
  • management/server/http/handlers/accounts/accounts_handler.go
  • management/server/types/settings.go
  • shared/connectionmode/mode.go
  • shared/connectionmode/mode_test.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto

Comment thread client/cmd/up.go
Comment thread client/internal/peer/env_test.go Outdated
Comment thread client/internal/peer/env.go Outdated
Comment thread client/internal/profilemanager/config.go Outdated
Comment thread management/server/account.go
Comment thread management/server/http/handlers/accounts/accounts_handler.go
Comment thread shared/management/http/api/types.gen.go
MichaelUray and others added 5 commits May 2, 2026 09:36
…changes

UpdateAccountSettings only set updateAccountPeers=true when one of a
fixed list of fields changed (RoutingPeerDNSResolutionEnabled,
LazyConnectionEnabled, DNSDomain, AutoUpdate*). The Phase 1
ConnectionMode plus RelayTimeoutSeconds and P2pTimeoutSeconds were
omitted, so a settings change to any of these was persisted and
audited but never bumped the account network serial -- connected
peers kept running with the old values until an unrelated change or
reconnect picked them up.

Add the three Phase 1 fields to the gating block using the existing
equalStringPtr / equalUint32Ptr helpers (already used a few lines down
in handleConnectionModeSettings for audit emission).

Test:
TestDefaultAccountManager_UpdateAccountSettings_ConnectionModeIncrementsSerial
verifies that:
  - re-applying identical settings does NOT increment Network.Serial
  - changing ConnectionMode increments Serial
  - changing RelayTimeoutSeconds increments Serial
  - changing P2pTimeoutSeconds increments Serial

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ode null semantics

parseAccountSettings narrowed *int64 timeout fields straight to uint32
without checking either end of the range. Negative values wrapped to a
huge positive uint32, and values above math.MaxUint32 silently
truncated. Both became persisted account state.

Add validateUint32Timeout(field, ptr) -> (*uint32, error) and use it
for p2p_timeout_seconds and relay_timeout_seconds. Returns a clear
"%s must be >= 0 / <= MaxUint32" error so the API caller sees the
problem instead of getting a corrupted setting saved.

Also correct the ConnectionMode block comment: the previous wording
implied JSON null could clear an existing override. The
*api.AccountSettingsConnectionMode pointer cannot distinguish absent
from explicit null at this layer, so both leave the stored value
untouched. Document that and flag the API gap for a future revision.

Test:
TestValidateUint32Timeout covers nil, zero, common positive, MaxUint32
boundary, negative-rejected, and over-MaxUint32 rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…spell

ResolveModeFromEnv parsed NB_LAZY_CONN_INACTIVITY_THRESHOLD with
time.ParseDuration and then narrowed straight to uint32 via
uint32(d.Seconds()). Two ways that silently produced wrong state:

  - negative durations (e.g. "-30s") wrapped to a huge positive uint32
  - durations larger than ~136 years truncated when seconds overflowed
    uint32

Reject both with a warn-and-ignore so the daemon falls back to the
server-pushed value or the built-in default instead of running with a
nonsensical relay timeout.

Test additions:
  - "unparsable inactivity duration is ignored" (existing case renamed
    from "unparseable" so codespell doesn't trip on the s/ss spelling)
  - "negative inactivity duration is ignored"
  - "oversized inactivity duration is ignored" (2000000h, > MaxUint32 s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…int32

The profile Config struct held both timeout fields as
`uint32 \`json:",omitempty"\`` which conflated three states the rest
of the code already cares about:

  - "no local override, follow whatever the server pushes"
  - "explicit zero" (a future "disable this timer" sentinel)
  - "explicit non-zero N seconds"

omitempty drops zero values on save, so a user who explicitly typed
`--relay-timeout 0` would see that intent erased the next time the
profile JSON was rehydrated.

Switch the persistence type to `*uint32`. ConfigInput already used
pointer types so the input-side semantics stay the same; the new
nil/value distinction is now preserved across save/load. connect.go
flattens to the runtime uint32 expected by EngineConfig via a small
derefUint32 helper -- the resolve chain in conn_mgr.go still uses 0
as "fall through" and is left untouched here, so a profile with
explicit-zero is not yet honored at runtime, but the on-disk data
is at least honest for when that resolve gets nullable-aware.

Test:
TestProfileTimeoutsNullableRoundtrip JSON marshals and unmarshals a
matrix of {nil, explicit zero, non-zero} combinations and verifies
nil-ness plus value survives the roundtrip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ParseString only accepted the canonical "relay-forced" form, even
though older CLI docs and the dashboard surface "Relay" as the user-
facing label. A user passing --connection-mode relay got back an
"unknown connection mode" error and silently fell through to
ModeUnspecified, which was confusing given the spec uses both names
interchangeably.

Map both "relay-forced" and "relay" to ModeRelayForced. The canonical
persisted form is unchanged (Mode.String() still returns
"relay-forced"); the alias is only honored on parse so a config save
+ reload normalises to canonical.

Tests cover the alias plus the existing case-insensitive and
whitespace-tolerant behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 2, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
5 New issues
5 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/peer/env.go`:
- Around line 55-61: The env parsing branch for EnvKeyNBConnectionMode currently
accepts connectionmode.ModeFollowServer; update the logic after calling
connectionmode.ParseString(raw) (used in this block and the variable mode) to
treat connectionmode.ModeFollowServer the same as
connectionmode.ModeUnspecified: either set mode to parsed only if parsed !=
connectionmode.ModeUnspecified && parsed != connectionmode.ModeFollowServer, or
log a warning and ignore parsed when parsed == connectionmode.ModeFollowServer
so it falls through to server/default resolution; keep using
connectionmode.ParseString and the existing mode variable and warn with
log.Warnf when ignoring follow-server.

In `@management/server/account.go`:
- Around line 462-466: The audit metadata currently collapses nil and 0 for
RelayTimeoutSeconds and P2pTimeoutSeconds which makes transitions ambiguous;
update handleConnectionModeSettings (and any code building the meta payload for
connection-mode changes) to preserve the null state explicitly by storing nil in
the metadata (or include an explicit boolean flag like
RelayTimeoutSecondsIsNil/P2pTimeoutSecondsIsNil) rather than coercing nil to 0,
and ensure both old and new values in the meta payload reflect nil vs numeric
values so audit events show "nil -> 0" correctly.

In `@management/server/http/handlers/accounts/accounts_handler.go`:
- Around line 29-41: The validation currently returns plain fmt.Errorf which
bypasses the handler's status mapping; update validateUint32Timeout to return
structured gRPC-like errors using status.Errorf(status.InvalidArgument, ...) for
both the "<field> must be >= 0" and "<field> must be <= ..." branches (replace
fmt.Errorf calls), and do the same for the connection_mode validation referred
to around the connection_mode checks (lines that validate connection_mode) so
those branches return status.Errorf(status.InvalidArgument, ...) with the same
messages; keep the function signature and returned types unchanged
(validateUint32Timeout, and the connection_mode validation logic).
- Around line 243-266: The handler currently treats omitted and explicit JSON
null the same because req.Settings uses plain pointer fields (connection_mode,
P2pTimeoutSeconds, RelayTimeoutSeconds), so the PUT cannot clear stored
overrides; change to presence-aware decoding (e.g., make the request Settings
struct use json.RawMessage or a small nullable wrapper type that records
"present" vs "absent", or add explicit boolean "Has..." flags) and update the
handler logic around validateUint32Timeout, returnSettings and connection_mode
handling to branch on presence: if the field is present and null -> set the
corresponding returnSettings.* to nil to clear the override; if present and
non-null -> validate/assign as now; if absent -> leave existing value untouched.
Ensure the handler inspects the new presence-aware symbols (the Settings
raw/json fields or HasP2pTimeoutSeconds/HasConnectionMode flags) rather than
relying on pointer nil to decide whether to clear vs omit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f25c092-a93d-47d1-b2ed-c1746971259d

📥 Commits

Reviewing files that changed from the base of the PR and between c272786 and 5ef9b09.

📒 Files selected for processing (11)
  • client/internal/connect.go
  • client/internal/peer/env.go
  • client/internal/peer/env_test.go
  • client/internal/profilemanager/config.go
  • client/internal/profilemanager/config_test.go
  • management/server/account.go
  • management/server/account_test.go
  • management/server/http/handlers/accounts/accounts_handler.go
  • management/server/http/handlers/accounts/accounts_handler_test.go
  • shared/connectionmode/mode.go
  • shared/connectionmode/mode_test.go
✅ Files skipped from review due to trivial changes (3)
  • shared/connectionmode/mode.go
  • shared/connectionmode/mode_test.go
  • client/internal/peer/env_test.go

Comment on lines +55 to +61
if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" {
parsed, err := connectionmode.ParseString(raw)
if err != nil {
log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err)
} else if parsed != connectionmode.ModeUnspecified {
mode = parsed
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject follow-server in env resolution.

connectionmode.ModeFollowServer is a config-only sentinel, but this branch currently returns it for NB_CONNECTION_MODE. client/internal/conn_mgr.go only strips that sentinel from config values, so NB_CONNECTION_MODE=follow-server becomes the final mode instead of falling through to server/default resolution. Treat it as ModeUnspecified here, or ignore it with a warning.

Suggested fix
 	if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" {
 		parsed, err := connectionmode.ParseString(raw)
 		if err != nil {
 			log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err)
+		} else if parsed == connectionmode.ModeFollowServer {
+			log.Warnf("ignoring %s=%q: follow-server is only valid in client config", EnvKeyNBConnectionMode, raw)
 		} else if parsed != connectionmode.ModeUnspecified {
 			mode = parsed
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/peer/env.go` around lines 55 - 61, The env parsing branch for
EnvKeyNBConnectionMode currently accepts connectionmode.ModeFollowServer; update
the logic after calling connectionmode.ParseString(raw) (used in this block and
the variable mode) to treat connectionmode.ModeFollowServer the same as
connectionmode.ModeUnspecified: either set mode to parsed only if parsed !=
connectionmode.ModeUnspecified && parsed != connectionmode.ModeFollowServer, or
log a warning and ignore parsed when parsed == connectionmode.ModeFollowServer
so it falls through to server/default resolution; keep using
connectionmode.ParseString and the existing mode variable and warn with
log.Warnf when ignoring follow-server.

Comment on lines +462 to +466
// handleConnectionModeSettings emits one audit event per changed Phase-1
// connection-mode setting (mode, relay timeout, p2p timeout). Each event
// carries old/new values in the meta payload so administrators can audit
// the full transition. NULL transitions show as empty string / 0 in the
// meta — chosen over a sentinel so the frontend can render uniformly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve NULL versus 0 in timeout audit events.

RelayTimeoutSeconds and P2pTimeoutSeconds use nil for “built-in default”, but 0 is also a valid configured value. Collapsing both to 0 makes the audit trail ambiguous; a nil -> 0 change is currently logged as 0 -> 0. Store nil explicitly in the metadata or add null-state flags alongside the numeric values.

Suggested direction
+func auditUint32Meta(p *uint32) any {
+	if p == nil {
+		return nil
+	}
+	return *p
+}
+
 func (am *DefaultAccountManager) handleConnectionModeSettings(ctx context.Context, oldSettings, newSettings *types.Settings, userID, accountID string) {
 	…
 	if !equalUint32Ptr(oldSettings.RelayTimeoutSeconds, newSettings.RelayTimeoutSeconds) {
 		am.StoreEvent(ctx, userID, accountID, accountID, activity.AccountRelayTimeoutChanged, map[string]any{
-			"old": derefUint32Ptr(oldSettings.RelayTimeoutSeconds),
-			"new": derefUint32Ptr(newSettings.RelayTimeoutSeconds),
+			"old":         auditUint32Meta(oldSettings.RelayTimeoutSeconds),
+			"new":         auditUint32Meta(newSettings.RelayTimeoutSeconds),
+			"old_is_null": oldSettings.RelayTimeoutSeconds == nil,
+			"new_is_null": newSettings.RelayTimeoutSeconds == nil,
 		})
 	}
 	if !equalUint32Ptr(oldSettings.P2pTimeoutSeconds, newSettings.P2pTimeoutSeconds) {
 		am.StoreEvent(ctx, userID, accountID, accountID, activity.AccountP2pTimeoutChanged, map[string]any{
-			"old": derefUint32Ptr(oldSettings.P2pTimeoutSeconds),
-			"new": derefUint32Ptr(newSettings.P2pTimeoutSeconds),
+			"old":         auditUint32Meta(oldSettings.P2pTimeoutSeconds),
+			"new":         auditUint32Meta(newSettings.P2pTimeoutSeconds),
+			"old_is_null": oldSettings.P2pTimeoutSeconds == nil,
+			"new_is_null": newSettings.P2pTimeoutSeconds == nil,
 		})
 	}
 }

Also applies to: 474-483

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/account.go` around lines 462 - 466, The audit metadata
currently collapses nil and 0 for RelayTimeoutSeconds and P2pTimeoutSeconds
which makes transitions ambiguous; update handleConnectionModeSettings (and any
code building the meta payload for connection-mode changes) to preserve the null
state explicitly by storing nil in the metadata (or include an explicit boolean
flag like RelayTimeoutSecondsIsNil/P2pTimeoutSecondsIsNil) rather than coercing
nil to 0, and ensure both old and new values in the meta payload reflect nil vs
numeric values so audit events show "nil -> 0" correctly.

Comment on lines +29 to +41
func validateUint32Timeout(field string, ptr *int64) (*uint32, error) {
if ptr == nil {
return nil, nil
}
v := *ptr
if v < 0 {
return nil, fmt.Errorf("%s must be >= 0, got %d", field, v)
}
if v > int64(math.MaxUint32) {
return nil, fmt.Errorf("%s must be <= %d, got %d", field, uint32(math.MaxUint32), v)
}
out := uint32(v)
return &out, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return structured invalid-argument errors here.

These branches build plain fmt.Errorf values for client validation failures, so they bypass the handler’s normal status.InvalidArgument path and surface as generic server errors instead of a 4xx. Use status.Errorf(status.InvalidArgument, …) for both timeout bounds and connection_mode validation.

Suggested fix
 func validateUint32Timeout(field string, ptr *int64) (*uint32, error) {
 	…
 	if v < 0 {
-		return nil, fmt.Errorf("%s must be >= 0, got %d", field, v)
+		return nil, status.Errorf(status.InvalidArgument, "%s must be >= 0, got %d", field, v)
 	}
 	if v > int64(math.MaxUint32) {
-		return nil, fmt.Errorf("%s must be <= %d, got %d", field, uint32(math.MaxUint32), v)
+		return nil, status.Errorf(status.InvalidArgument, "%s must be <= %d, got %d", field, uint32(math.MaxUint32), v)
 	}
 	…
 }
 
 if req.Settings.ConnectionMode != nil {
 	modeStr := string(*req.Settings.ConnectionMode)
 	if !req.Settings.ConnectionMode.Valid() {
-		return nil, fmt.Errorf("invalid connection_mode %q", modeStr)
+		return nil, status.Errorf(status.InvalidArgument, "invalid connection_mode %q", modeStr)
 	}
 	…
 }

Also applies to: 238-241

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/accounts/accounts_handler.go` around lines 29
- 41, The validation currently returns plain fmt.Errorf which bypasses the
handler's status mapping; update validateUint32Timeout to return structured
gRPC-like errors using status.Errorf(status.InvalidArgument, ...) for both the
"<field> must be >= 0" and "<field> must be <= ..." branches (replace fmt.Errorf
calls), and do the same for the connection_mode validation referred to around
the connection_mode checks (lines that validate connection_mode) so those
branches return status.Errorf(status.InvalidArgument, ...) with the same
messages; keep the function signature and returned types unchanged
(validateUint32Timeout, and the connection_mode validation logic).

Comment on lines +243 to +266
// Persist as the canonical string. Note: the request type uses a
// non-pointer `*api.AccountSettingsConnectionMode` and JSON null
// is indistinguishable from an absent field at this layer -- both
// land here as a nil pointer and skip this whole block, leaving
// the existing value untouched. There is currently no API path
// that lets a client *clear* an explicit override; the next
// settings revision should switch to a sentinel-aware wrapper.
s := modeStr
returnSettings.ConnectionMode = &s
}
if req.Settings.P2pTimeoutSeconds != nil {
v, err := validateUint32Timeout("p2p_timeout_seconds", req.Settings.P2pTimeoutSeconds)
if err != nil {
return nil, err
}
returnSettings.P2pTimeoutSeconds = v
}
if req.Settings.RelayTimeoutSeconds != nil {
v, err := validateUint32Timeout("relay_timeout_seconds", req.Settings.RelayTimeoutSeconds)
if err != nil {
return nil, err
}
returnSettings.RelayTimeoutSeconds = v
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This endpoint still can't clear the new nullable overrides.

connection_mode, p2p_timeout_seconds, and relay_timeout_seconds all collapse “field omitted” and “explicit JSON null” into the same nil value here. That means once one of these settings is stored, this PUT handler has no way to reset it back to NULL/built-in-default semantics, which breaks the nullable contract introduced in this PR. This needs presence-aware decoding rather than plain pointer fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/accounts/accounts_handler.go` around lines
243 - 266, The handler currently treats omitted and explicit JSON null the same
because req.Settings uses plain pointer fields (connection_mode,
P2pTimeoutSeconds, RelayTimeoutSeconds), so the PUT cannot clear stored
overrides; change to presence-aware decoding (e.g., make the request Settings
struct use json.RawMessage or a small nullable wrapper type that records
"present" vs "absent", or add explicit boolean "Has..." flags) and update the
handler logic around validateUint32Timeout, returnSettings and connection_mode
handling to branch on presence: if the field is present and null -> set the
corresponding returnSettings.* to nil to clear the override; if present and
non-null -> validate/assign as now; if absent -> leave existing value untouched.
Ensure the handler inspects the new presence-aware symbols (the Settings
raw/json fields or HasP2pTimeoutSeconds/HasConnectionMode flags) rather than
relying on pointer nil to decide whether to clear vs omit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant